Wire tier-2b Langfuse Generation fixtures by chris-colinsky · Pull Request #186 · LunarCommand/openarmature-python

chris-colinsky · 2026-06-24T19:36:32Z

Summary

Completes the Langfuse tier of the conformance-harness fixture catch-up: wire
the two Langfuse Generation fixtures into the YAML harness. Test-only, no
library change, no pin bump.

Wired (2)

Moved from _UNIT_TESTED_FIXTURES to _SUPPORTED_FIXTURES:

023: Generation rendering (model / modelParameters / usage / input-output
metadata) plus the payload-truncation fallthrough (input becomes the raw
marker-bearing string once it exceeds the byte cap).
024: Prompt-entity linkage, both the present case (a backend exposing a
Langfuse Prompt reference) and the absent case.

Harness machinery added

_run_langfuse_generation_fixture builds a calls_llm graph, records into an
InMemoryLangfuseClient under the fixture's disable_provider_payload /
payload_byte_cap config, and asserts the Generation observation nested under
the node span.
_assert_langfuse_generation_fields covers model / modelParameters / usage /
prompt_entity_link and the two input shapes (native message list under the
cap, raw truncated string with the marker over it). The placeholder-capable
fields run through the value matcher.
The value matcher gained nested-dict recursion so 024's metadata.prompt
(with an inner <any-string> rendered_hash) matches.
_materialize_typed_messages gained content_repeat synthesis (023), and
_render_prompt_result carries a backend's Langfuse prompt reference into
PromptResult.observability_entities, which the observer resolves into the
Generation's prompt-entity link (024).

Testing

tests/conformance/test_observability.py: 72 passed, 40 skipped.
Full tests/: 1464 passed, 406 skipped.
ruff and pyright clean.

Move 023 (generation rendering + payload truncation) and 024 (prompt linkage) from _UNIT_TESTED_FIXTURES into _SUPPORTED_FIXTURES, driven through a LangfuseObserver + InMemoryLangfuseClient recorder. Completes the Langfuse tier of the fixture-harness catch-up; test-only, no library change, no pin bump. Adds a generation runner that asserts the Generation observation (model / modelParameters / usage / input-output payload + prompt-entity link) nested under the node span, plus content_repeat synthesis + payload_byte_cap truncation (023) and a prompt-backend Langfuse reference carried via PromptResult.observability_entities (024). The value-matcher gained nested-dict recursion for metadata.prompt. No deferrals.

Copilot

Pull request overview

This PR completes the YAML conformance-harness wiring for the Tier 2b Langfuse “Generation” fixtures (023/024), moving them from unit-tested-only coverage into the main fixture runner. The changes are test-only and extend the harness to validate Langfuse Generation rendering, truncation behavior, and prompt-entity linkage per the spec mapping.

Changes:

Added a new Langfuse Generation fixture driver (_run_langfuse_generation_fixture) and Generation-field assertions integrated into the Langfuse observation-tree matcher.
Extended the Langfuse value matcher to recurse into nested mappings so placeholder tokens can match inside nested objects (needed for fixture 024).
Added content_repeat synthesis for typed messages and carried Langfuse prompt references via PromptResult.observability_entities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Address review feedback on PR #186: the input_is_raw_string_with_marker check matched a bare "[truncated" substring, which could false-positive on arbitrary content. Tighten it to a regex matching the full marker shape, matching the observer's _TRUNCATION_MARKER_TEMPLATE and consistent with the OTel marker_pattern approach.

Fold in the python-side nuances from spec's Tier 2 review: - _assert_langfuse_observation_tree now disambiguates same-(type, name) sibling observations (032's per-instance "process" spans) by their scalar metadata rather than emission order, so the assertions can't bind the wrong sibling if the observer's emission order shifts. - _run_invocation_id_case now asserts the fixture's top-level verbatim invocation_id clause (035/036) against the in-memory recorder's raw trace.id, so it isn't half-asserted across the OTel and Langfuse runners.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

+    # A regular NON-empty nested mapping (e.g. 024 metadata.prompt): recurse per
+    # key so inner tokens (rendered_hash: <any-string>) still apply. Subset over
+    # keys -- every expected key must be present and match; actual MAY carry
+    # extras. An empty expected dict falls through to exact equality below
+    # (rather than vacuously matching any mapping).


+    graph, state_cls, provider = _build_simple_llm_graph(case, populate_caller_metadata=False)
+    client = InMemoryLangfuseClient()
+    cfg = cast("dict[str, Any]", case.get("langfuse_observer") or {})
+    lf_kwargs: dict[str, Any] = {"client": client}


Copilot AI review requested due to automatic review settings June 24, 2026 19:36

Copilot started reviewing on behalf of chris-colinsky June 24, 2026 19:36 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread tests/conformance/test_observability.py

chris-colinsky added 2 commits June 24, 2026 12:50

Copilot AI review requested due to automatic review settings June 24, 2026 20:21

Copilot started reviewing on behalf of chris-colinsky June 24, 2026 20:21 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

chris-colinsky merged commit 122dcd2 into main Jun 24, 2026
6 checks passed

chris-colinsky deleted the chore/fixture-harness-tier-2b-generation branch June 24, 2026 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wire tier-2b Langfuse Generation fixtures#186

Wire tier-2b Langfuse Generation fixtures#186
chris-colinsky merged 3 commits into
mainfrom
chore/fixture-harness-tier-2b-generation

chris-colinsky commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chris-colinsky commented Jun 24, 2026

Summary

Wired (2)

Harness machinery added

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants